Deep-learning segmentation to select liver parenchyma for categorizing hepatic steatosis on multinational chest CT

Unenhanced CT scans exhibit high specificity in detecting moderate-to-severe hepatic steatosis. Even though many CTs are scanned from health screening and various diagnostic contexts, their potential for hepatic steatosis detection has largely remained unexplored. The accuracy of previous methodologies has been limited by the inclusion of non-parenchymal liver regions. To overcome this limitation, we present a novel deep-learning (DL) based method tailored for the automatic selection of parenchymal portions in CT images. This innovative method automatically delineates circular regions for effectively detecting hepatic steatosis. We use 1,014 multinational CT images to develop a DL model for segmenting liver and selecting the parenchymal regions. The results demonstrate outstanding performance in both tasks. By excluding non-parenchymal portions, our DL-based method surpasses previous limitations, achieving radiologist-level accuracy in liver attenuation measurements and hepatic steatosis detection. To ensure the reproducibility, we have openly shared 1014 annotated CT images and the DL system codes. Our novel research contributes to the refinement the automated detection methodologies of hepatic steatosis on CT images, enhancing the accuracy and efficiency of healthcare screening processes.


Supplementary Text 1: Dataset licenses and download links
Institutional review board (IRB) oversight was not required for our study, as our materials were collected from publicly available and deidentified datasets 1 .They were originally released in the following publications: LIDC-IDRI 2 , NSCLC-Lung1 3 , RIDER 4 , VESSEL12 5 , MIDRC-RICORD 6 , COVID-19-Italy 7 , and COVID-19-China 8 .The individual licenses for each dataset can be found in this supplementary text, along with the direct download links.
Accordingly, chest CT images were acquired from these public datasets, and our team of experts conducted manual liver segmentation for the development and validation of our AI methods.

LIDC-IDRI 2 :
The Image Database Resource Initiative (IDRI) was created to advance the Lung Image Database Consortium (LIDC) in 2004.The data can be accessed here 9 : https://www.cancerimagingarchive.net/collection/lidc-idri/.It's important to note that the accessible images are in the DICOM format.The dataset is licensed under CC BY 3.0 (Creative Commons Attribution License 3.0), allowing various forms of use or re-use as long as due acknowledgment is made of the original source and authorship, with no additional restrictions.
This dataset was studied with approval from its respective licensing institution, along with informed consent from all subjects involved.The initial publication 2 mentioned that these CT scans were gathered and released with the necessary local Institutional Review Board (IRB) approval from the picture archiving and communications systems (PACS) of the seven participating academic institutions.This dataset was studied with approval from its respective licensing committees, along with informed consent from all subjects involved.As stated in the initial publication 3 , this public dataset received approval from the Institutional Review Boards of all participating centers, with specific approval from the trial committee at Maastricht University Medical Center (MUMC+) in Maastricht, The Netherlands.VESSEL12 5 : This dataset is collected from the VESsel SEgmentation in the Lung (VESSEL12) challenge held in 2012.The chest CT images can be accessed through the official challenge website (https://vessel12.grand-challenge.org/)or downloaded from the Kaggle dataset page (https://www.kaggle.com/datasets/andrewmvd/lung-vesselsegmentation).This dataset was released to the public with approval from its respective licensing committees and/or institutions, along with informed/ waived consent from all subjects involved.

NSCLC-Lung1
As indicated in the initial publication 5 , the scans utilized for this challenge were sourced from the anonymized image repositories of three hospitals: University Medical Center Utrecht (Utrecht, The Netherlands), the University Clinic of Navarra (Pamplona, Spain), and Radboud University Nijmegen Medical Centre (Nijmegen, The Netherlands).In instances where institutional ethics committee approval was mandated, written consent for retrospective studies had been previously acquired from each participant.The MIDRC-RICORD-1a can be downloaded here 12 : https://www.cancerimagingarchive.net/collection/midrc-ricord-1a/.The MIDRC-RICORD-1b can be downloaded here 13 : https://www.cancerimagingarchive.net/collection/midrc-ricord-1b/.The dataset is licensed under CC BY-NC 4.0 (Creative Commons Attribution License 4.0), permitting various forms of use or re-use for non-commercial purposes, provided that due acknowledgment is made of the original source and authorship.This dataset was studied with approval from its respective licensing committees and/or institutions, along with informed/ waived consent from all subjects involved.As indicated in the initial publication 6 , institutional review board (ethics committee) approval was obtained from all sites for this retrospective study.For the United States site, a waiver of informed consent was obtained, and processes were compliant with the Health Insurance Portability and Accountability Act.Moreover, this public dataset is open for non-commercial use, spanning research, education, and AI system development for various disease entities beyond COVID-19 pneumonia.Hence, we utilized this data for the AI assessment of hepatic steatosis in our study.

3 :
NSCLC stands for non-small cell lung cancer.The Lung1 data set was released in 2014 which consisted of 422 NSCLC patients in the Netherlands.The data can be downloaded here 10 : https://www.cancerimagingarchive.net/collection/nsclc-radiomics/.It is licensed under CC BY-NC 3.0 (Creative Commons Attribution License 3.0), permitting various forms of use or re-use for non-commercial purposes, provided that due acknowledgment is made of the original source and authorship.

RIDER 4 :
The RIDER data set consists of 31 patients with two CT scans acquired approximately 15 min apart.The chest CT images can be accessed here 11 : https://www.cancerimagingarchive.net/analysis-result/rider-lungct-seg/.The dataset is licensed under CC BY 3.0 (Creative Commons Attribution License 3.0), allowing various forms of use or re-use as long as due acknowledgment is made of the original source and authorship, with no additional restrictions.

7 :Supplementary Text 2 :
The dataset is originally made of 62 COVID-19-positive patients and then enriched to 81 patients.Chest CT images can be downloaded here: https://www.imagenglab.com/newsite/covid-19/.As stated in the initial publication7 , the dataset is released and licensed under CC BY-NC 4.0 (Creative Commons Attribution License 4.0), permitting various forms of use or re-use for non-commercial purposes, provided that due acknowledgment is made of the original source and authorship.The image collection was conducted with approval from the Hospital Ethics Committee, under the protocol number "Prot.308," as stated in the initial publication.COVID-19-China8 : The dataset is made of 29 COVID-19-positive Chinese patients in Hubei Province, China.Chest CT images can be downloaded here, referred to as 'the second dataset 14 : https://www.imagenglab.com/newsite/covid-19/.The dataset is licensed under CCBY 4.0 (Creative Commons Attribution License 4.0), allowing various forms of use or re-use as long as due acknowledgment is made of the original source and authorship, with no additional restrictions.This dataset was studied with approval from its respective licensing committees and/or institutions, along with informed/ waived consent from all subjects involved.The study 8 , conducted at Xiangyang NO.1 People's Hospital Affiliated to Hubei University of Medicine in Xiangyang, Hubei, China, and the University of Milan (Universita Degli Studi Di Milano) Research Board/Institutional Review Board (IRB) in Milan, Italy, received approval (#20200702150947, #324-2020, 562-2020, and #335-2020).The retrospective observational nature of the study led to the waiver of informed consent requirements by both institutions.Participants and CT image details in public datasets LIDC-IDRI 2 : The Image Database Resource Initiative (IDRI) was created to further advance the Lung Image Database Consortium (LIDC) in 2004.The LIDC-IDRI contains a total of 1018 chest CT scans from 1010 patients, including both contrast-enhanced and nonenhanced CT scans.Images were collected from 7 participating academic institutions and 8 medical imaging companies in the USA.LIDC consists of diagnostic and lung cancer screening chest CT scans with annotated lung lesions.It is originally used to develop automated lung cancer detection and diagnosis.LIDC images were constructed from 4 scanner manufacturers and 17 different CT imaging models.The tube peak potential energies used for scan acquisition ranged from 120 to 140 kV.Tube current ranged from 40 to 627 mA.Slice thicknesses ranged from 0.6 to 4.0 mm.The reconstruction interval ranged from 0.45 to 5.0 mm.The in-plane pixel size ranged from 0.461 to 0.977 mm.Each CT scan was initially presented at a standard brightness/contrast setting without magnification.No participant demographics (age, gender, etc.) or clinical information is available for this dataset.NSCLC-Lung1 3 : NSCLC stands for non-small cell lung cancer.The Lung1 data set was released in 2014 which consisted of 422 NSCLC patients in the Netherlands.132 are women and 290 are men.The mean age was 67.5 years (range: 33-91 years).Patients were included if they have confirmed diagnoses of lung cancer or underwent treatment with curative intent.This dataset was initially proposed to assess the prognostic value of radiomic features for lung cancer.CT scans and clinical data were available in this study.

RIDER 4 :VESSEL12 5 :
The RIDER data set consists of 31 patients with two CT scans acquired approximately 15 min apart.Patients with non-small cell lung cancer were recruited in 2007 at Memorial Sloan-Kettering Cancer Center, New York, USA.The mean age is 62.1 years (range, 29-82 years), 16 were men (mean age, 61.8 years; range, 29-79 years) and 16 were women (mean age, 62.4 years; range, 45-82 years).Parameters for the 16-detector row scanner were as follows: tube voltage, 120 kVp; tube current, 299-441 mA; detector configuration.Parameters of the 64-detector row scanner were as follows: tube voltage, 120 kV; tube current, 298-351 mA.This dataset is collected from the VESsel SEgmentation in the Lung (VESSEL12) challenge held in 2012, which is to compare automatic methods of lung vessel segmentations taken from both healthy and diseased populations.CT scans were collected from three hospitals in the Netherlands and Spain in a variety of clinically common scanners and protocols.The dataset released 20 CT scans and around 10 scans contain abnormalities such as emphysema, nodules, or pulmonary embolisms.MIDRC-RICORD 6 : Medical Imaging Data Resource Center (MIDRC); RSNA International COVID-19 Open Radiology Database (RICORD); This set included two subsets in April 2020, which are Release-1A: Chest CT COVID Positive (MIDRC-RICORD-1a) and Release-1B: Chest CT COVID Positive (MIDRC-RICORD-1b).Each dataset consists of 120 chest CT scans from four international sites: the USA, Turkey, Canada, and Brazil.The dataset has two inclusion criteria: 1. Adults underwent chest CT scans for suspected COVID-19 infection; 2. COVID-19 positive (1A) confirmed by one or more conditions: reverse-transcription polymerase chain reaction test, immunoglobulin M antibody test, or clinical diagnosis using hospital-specific criteria.COVID-19-Italy 7 : The dataset is originally made of 62 COVID-19-positive patients and then enriched to 81 patients.The group of 62 patients underwent non-contrast chest CT scans in Italy in 2020.The average age was 56 years (range 20-83), and the male/female ratio was 23/27.Images were obtained with two different scanners with reconstructions of the volume at 0.3 to 1 mm slice thickness.Automatic lung tissue classification, clinical score, and intensive care unit information are provided as well.We chose the enriched set with one CT scan per patient, therefore adding up to 81 CT images in our study.COVID-19-China 8 : The dataset is made of 29 COVID-19-positive Chinese patients who received multiple non-contrast chest CTs between January 21st and April 12th, 2020 in Hubei Province, China.The patients were predominantly female (69%, 20/29), and were 41 ± 10 years old (range 25 to 60 years old).Each patient underwent multiple CT scans at different time points.We chose the baseline CT scans per patient and therefore added up to 29 CT images in our study.